18 research outputs found
What you say and how you say it : joint modeling of topics and discourse in microblog conversations
This paper presents an unsupervised framework for jointly modeling topic content and discourse behavior in microblog conversations. Concretely, we propose a neural model to discover word clusters indicating what a conversation concerns (i.e., topics) and those reflecting how participants voice their opinions (i.e., discourse).1 Extensive experiments show that our model can yield both coherent topics and meaningful discourse behavior. Further study shows that our topic and discourse representations can benefit the classification of microblog messages, especially when they are jointly trained with the classifier
Code Structure Guided Transformer for Source Code Summarization
Code summaries help developers comprehend programs and reduce their time to
infer the program functionalities during software maintenance. Recent efforts
resort to deep learning techniques such as sequence-to-sequence models for
generating accurate code summaries, among which Transformer-based approaches
have achieved promising performance. However, effectively integrating the code
structure information into the Transformer is under-explored in this task
domain. In this paper, we propose a novel approach named SG-Trans to
incorporate code structural properties into Transformer. Specifically, we
inject the local symbolic information (e.g., code tokens and statements) and
global syntactic structure (e.g., data flow graph) into the self-attention
module of Transformer as inductive bias. To further capture the hierarchical
characteristics of code, the local information and global structure are
designed to distribute in the attention heads of lower layers and high layers
of Transformer. Extensive evaluation shows the superior performance of SG-Trans
over the state-of-the-art approaches. Compared with the best-performing
baseline, SG-Trans still improves 1.4% and 2.0% in terms of METEOR score, a
metric widely used for measuring generation quality, respectively on two
benchmark datasets